DOCUMENT RESUME 



ED 326 539 



TM 015 183 



AUTHOR 
TITLE 

PUB DATE 
NOTE 



PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Stivers, Janet 

Relating the NTE to Research Literature on Teaching 

Effectiveness. 

Apr 90 

36p.; Paper presented at the Annual Meeting of the 
National Council on Measurement in Education (Boston, 
MA, April 17-19, 1990). 
Speeches/Conference P^pers (150) 

MF01/PC02 Plus Postage. 

Administrators; ^College Faculty; Delphi Technique; 
^Educational Research; Higher Education; 
^Instructional Effectiveness; Licensing Examinations 
(Professions); Literature Reviews; ^Research 
Utilization; Teacher Certification: T^:acher 
Educators; ^Teacher Effectiveness; Test Items; Test 
Reliability; Test Validity 

^National Teacher Ei^aminations; Subject Content 
Knowledge; *Test of Professional Knowledge 



ABSTRACT 

Findings of teaching effectiveness research were 
related to the National Teacher Examinations (NTE) Test of 
Professional Knowledge (TPK) • Delphi methodology was used to classify 
TPK items according to findings in a selected review o^ research. Six 
professors in teacher education completed a questionnaire to rate 
seven literature reviews. The three most highly related were 
presented to the Delphi panelists — three public school administrators 
and six college teachers from various areas of teacher education. In 
Phase 2, two panels of six experienced classroom teachers each 
considered half of the 104 TPK items in light of the selected review. 
Of the 104 TPK items, 20 were classified as supported by research; 
only 9 of the 2u were judged to require knowledge of that research. 
It is suggested that the TPK may be missing opportunities to measure 
important aspects of teachers' professional knowledge. Alternate 
strategies for meas ring knowledge of teaching effectiveness res \rch 
are considered for :he planned successor to the NTE. Four ta!:les 
contain study data. A 51-"item list of references is included. 
(SLD) 



* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



CO 

CO 



U.t. OCMHTMtNT COUCATION 
Otttc* o( Educ«tion*t RMMfCh «nd lmorov«n>*nt 

EOOCATICNAL RESOURCES INFORMATION 
CENTER (Er^»0 

document hM bMr r«produc«d u 
rtc«tv«d from tht p«r«On or orotnuation 
onotnatine it 
P K ..»0f Ch«n9»» htv« b««fl m»dt to »mpfOv« 
r«pfoduct>on Qu«iitv 

• Po(nt*ofvt*w9ropintonssUt*<jinthtsdOCu- 
mtiX do not n«c*tMn(y rvprtstnt official 
OERl pOtttion or pOkCy 



^'PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



Relating the NTE to Research Literature on Teaching Effectiveness 
Janet Stivers, Marist College 

Abstract 

The purpose of this study was to reld.e findings from teaching 
effectiveness research to the NTE Test of Professional Knowledge (TPK) . 
Delphi methodology was used to classify TPK items according to findings in \ 
selected review of resea-^ch. Of the 104 TPK items, 20 were classified 
supported by research; only nine of the 2G were judged to require knowledge 
of that research. One conclusion is that the TPK may be viewed as missing 
opportunities to measure important aspects of t*>achers' professional 
knowledge. Alternate strategies for measuring knowledge of teaching 
effectiveness research are considered for the planned successor to the NTE. 



Paper presented at the annual meeting of the 
National Council on Measurement in Education 

Boston 
April, 1990 



O-f the many education re-forms recommended in recent years, -few have 
been implemented as quickly as those calling for teacher testing. Before 
1980, only three states required teacher applicants to pass initial teacher 
certification test-^^; currently, 44 states have laws or regulations requiring 
such testing. The test that is used most often is the NTE (previously known 
as the National Teacher Examinations), currently used in 22 s'ates (Rudner, 
1987). Because the testing of teachers is a widespread and apparently 
stable phenomenon, it is important to insure that, among other things, the 
tests that are used are well constructed. 

Determining the appropriate content for k test of teachers' 

professional knowledge, particularly one to be used nation-wide, presents 

certain difficulties because of a lack of agreement about the appropriate 

content of teacher education curricula, which have been described as 

fragmented and unstable. Although there is disagreement about what should 

be taught in teacher preparation programs, few would fail to accord a place 

•n the curriculum to conclusions drawn from research on teaching. 

Correlational and experimental research conducted over the past twenty years 

has produced a body of knowledge with important implications for teaching 

practice (Berliner, 1984; Gage, 1978, 1985; Good. 1983; Hosford, 1984; 

Hunter, 1984; B. 0. Smith, 1985; D. C. Smith, 1982, 1983). Several 

experimental studies have demonstrated that teachers can learn to use 

recommendations drawn from research, and that those who do so have students 

who make greater achievement gains than do students of other teachers (e.g., 

Anderson, E^/ertson, and Brophy, 1979; Borg and Ascione, 1982; Crawford, 

Gage, Corno, Stayrook, and Mitman, 1978; Emmer, Sanford, Evertson, Clemens, 

and Martin, 1981; Good and Grouws, 1979, 1981; Emmer, Sanford, Clemens, and 

1 



Martin, 1982; Stall ings^ Needels, and Stayrook, 1979} cited in Gage, 1985). 
Another experimental study has demonstrated that, like practicing teachers, 
preservice teachers can learn research-based techniques and apply them in 
the classroom (Hindman and Polsgrove, 1988). Several researchers and 
teacher educators have emphasized the importance o-f incorporating knowledge 
derived -from what is commonly called "teacher e-f-fecti veness research* into 
the content o-f teacher education programs (Berliner, 1984; Clark, 1984; 
Doyle, 1982; Egbert, 1984; Gage, 1978, 1985; Good, 1983; Hersh, 1982; 
Kluender, 1984; National Commission -for Excellence in Teacher Education, 
1985; B. 0. Smith, 1985; Stall ings, 1984.) Presumably, this knowledge also 
should be represented on initial teacher certification tests. 

It is particularly important to veri-fy that -findings -from research 
on teacher e-f-fectiveness, such as those identified by Brophy and Good (1984) 
and Berliner (1984), are represented on the Test o-f Pro-fessional Knowledge 
because Educational Testing Service (ETS), administrator of the NTE, has 
assumed a role in shaping teacher education curricula. ETS has approved the 
use of the NTE for the evaluation of teacher education programs and offers 
item summary workshops to help college personnel identify curricular areas 
that might be modified to improve their students^ test performance. 
Further, because the test is used for selection, the emphases in the test 
can be viewed by education faculty and students as crucial, necessary, and 
appropriate targets for instruction. Thus, the NTE can be viewed as shaping 
the definition of "professional knowledge" for teachers. 

The purpose of this study was to estimate the extent to which the 
findings from research on teacher effectiveness are represented on the ffTE 
Test of Professional Knowledge. The specific purposes were: 

1. to select, from several reviews of research on ieacher 
effectiveness, those that are most reflective of the findings which -should 

2 



be included on an initial teacher certi-f ication test, 

2. to identiJy research -findings that are judged to be represented, 
among the items on a released +orm of the TPK, and 

3. to identi-fy items that are Judged to be supported by research 
findings incluo^d in the selected reuiew, and for items Judged supported by 
research, to identify *:hose items for which knowledge of research findings 
is important in selecting the keyed option; also, to identify items that are 
judged to be contradictory to research findings, that is, items for which 
the research literature might suggest a different keyed answer. 

The study was conducted in two phases. In Phase I, a preliminary 
screening followed by a two-stage Delphi investigation was used to identify 
relevant reviews of research; in Phase II, a two-stige Delphi process was 
used to examine TPK items in light of the research findings includec* in the 
review<s) selected in Phase I. 



Phase I 

Preliminary Screening 

k9search on teacher effectiveness has been reviewed and summarized 
by several authors (e.g*, Berliner, 1984; Brophy and Good, 1986; Doyle, 
1986; f^osenshine and Stevens, 1986; United States Department of Education, 
1986, 1987). To identify the reviews that would be most appropriate for use 
in this study, a preliminary screening was conducted. First, a list of 
reviews wns compiled which included all those review articles (rather than 
book-length .^orks) which had been published within the previous five years, 
which reviewed several studies rather than describing a single research 
project, and which discussed research findings applicable to a variety of 
educat i onal sett ings. 

3 



Twenty university pro-fessors in teacher education, instructional 
psychology, and educational psychology were '"len contacted by letter and 
asked if they would be witling to rate such reviewi on several dimensions. 
They were also asked to examine the list o-f reviews that might be included 
on a rating -form, and to suggest other appropriate ^oviews. Fifteen 
professors responded, with ten agreeing to participate and three o-f the ten 
suggesting changes that resulted in two substitutions and two additions to 
the list o-f reviews. The reviews rated were: 

Berliner, D. C. (1984). The haH-full glass: A review o-f research on 

teaching. In P. L. Hos-ford (Ed.), Usino what we know about teaching (pp. 
51-77). Alexandria, VA: ASCD. 

Brophy, J,, & Good, T.L. (1986). Teacher behavior and student achievement. 
In M. C. Wittrock (Ed.), Handbook O'f research on teaching (3rd ed., pp. 
328-375). New York: Hacmillan. 

Doyle, U. (1986). Classrooni organization and management. In M. C. Wittrock 
(Ed.), Handbook O'f research on teaching (3rd ed., pp. 392-431). New 
York : Macmi 1 Ian . 

Rosenshine, B. 4: Stevens, R. (1986). Teaching -functions. In M. C. Wittrock 
(Ed.), Har,dbook o-f research on teaching (3rd ed., pp. 376-391). New York: 
Macmi 1 Ian. 

U.S. Department o-f Education. (1987). What works: Research about teachin g 
and learning. (2nd ed.). Washington, D.C.: Author. 

Walbt^rg, H.J. (1986) Syntheses o-f research on teaching. In M. C. Wittrock 
(Ed.), Handbook o-f research on teaching (3rd ed., pp. 214-229). New York: 
Macm i 1 1 an • 

Wyne, M. L\ & Stuck, G. B. (1982). Time and learning: Implications -for the 
classroom teacher. Elementary School Journal. 83^ 68-75. 

Resul ts. Six professors completed a single questionnaire, using a 

scale of 1 — poor to 5 — superior, to rate the reviews on each o-f -five 

dimensions: scholarship, comprehensiveness, understandabi 1 i ty , -freedom from 

reviewer bias, and emphasis on general rather than grade- or 

subject-specific -findings. Their ratings and the means are presented in 

Table 1. 

Variability among the raters is evident. For example. Professor C 



TJVBLE 1 

RESULTS OP PRELIMIHARY SCREENINO: MEANS & RATINGS 



Scholarship 



Comprehen- 
siveness 




Und'^rstand- 
ability 



Freedom From Emphasis on 
Bias General Findings 



Shki 



Berliner, D. C. 
(1984) 

Brophy & Good 
(1986) 

Doyle, W. 
(1986) 

Rosenshine & Stevens 
(1986) 

U.S. Depax-tment 
of Education (1987) 

Walberg, H. J. 
(1986) 

Wyne f. Stuck 
(1982) 



3.25 

3 - - 4 4 2 
4.50 

5 5 5 5 4 3 
4.17 

4 4 5 5 4 3 
4.00 

3 5 5 4 4 3 
2.17 

2 2 2 3 3 1 
4.33 

3 5 5 5 5 3 
3.00 

3 - - 3 4 2 



2.75 

2 - - 4 3 2 
4.50 

4 5 5 5 5 3 
4.00 

3 3 5 5 5 3 
4.00 

4 5 5 3 4 3 
3.17 

5 2 3 4 3 2 
3.67 

4 3 4 4 4 3 



2.50 
2 - ■ 



3 3 2 



4.00 

4 - - 4 4 4 
4.00 

3 5 5 4 3 4 
4.17 

3 4 5 5 /- 4 
4.67 

5 5 5 4 5 4 
4.00 

5 3 3 4 5 4 
3.00 

2 3 3 4 4 2 
3.50 

3 - - 4 3 4 



3.25 

3 - - 4 4 2 
4.17 

5 5 5 5 3 2 
4.17 

4 4 5 5 4 3 
3.83 

4 5 5 3 4 2 
2.17 

2 2 2 3 3 1 
3.83 

3 5 5 3 4 3 
3 . 50 

4 - - 4 4 2 



3,50 

4.-442 
4.00 

4 5 5 5 3 2 
3.83 

3 4 5 5 4 2 
3.75 

4 5 5 2.5 4 2 
3.67 

5 3 3 4 4 3 
3.83 

4 5 5 4 3 2 
2.75 

3 - - 4 3 1 



3.35 'M 

67 m 

4.23 
127 



••I 



4.07 
122 

4.05 
121.5 

3.03 
91 

3.73 
112 

3.05 
61 



HOTf: Raters used the following scale: 5- superior; 4 - above average; 3 - average; 2 - below average; 1 - poor 
The rating, are presented in order, i.e.. the ratings assigned by panelist A are always in the first position 
and panelist C's ratings are in the third positioa, etc. 



A - indicates that a panelist chose not to rate, a particular dimension or review. 

7 



8 



€1 

^€ 

4 

4 
M 

■i 

M 



rated five reviews, and gave three o-f them "per-fect scores," assigning 5s on 
each dimension. By contrast, in rating seven reviews, Pro-fessor F assigned 
no 5s at all, and assigned 4s on only one dimension (understandabi 1 i ty> . 
Within the 35 cells o-f the questionnaire (seven reviews, each rated on -five 
dimensions), there were only three instances in which a panelist assigned a 
rating lower than that assigned by Pro-fessor F. 

Despite the variability, there are consistencies. Each professor 
assigned his or her lowest ratings to the Mnited States Department o-f 
Education publication, Uhat Works (1987), on the dimensions o-f scholarship 
and -freedom -from reviewer bias. No one assigned ratings o-f 5 to the reviews 
by Berliner (1984) or Wyne and Stuck (1982). I-f one were to generate -for 
each panelist a rank order list o-f the reviews, (produced by summing the 
panelist^s ratings across all the dimensions o-f the review), the article by 
Wyne and Stuck would occupy the lowest position on each o-f the lists on 
which it appears. 

Only three reviews had mean ratings above 4 on the scale o-f 1 to 5; 
the same three reviews also represented all . those rated highest and 
second-highest by each pro-fessor. There-fore, the reviews selected -for use 
in the next stage o-f Phase I were Brophy and Good's (1986) •'Teacher Behavior 
and Student Achievement", Doyle's (1986) "Classroom Organization and 
Management", and Rosenshine and Stevens' (1986) "Teaching Functions". 

Del phi Invest i oat i on 

Panel ists. Individuals participating in the Delphi portion 
(Linstone and Turo-f-f, 1975) o-f Phase I were asked to examine the reviews o-f 
research by Brophy and Good, Doyle, and Rosenshine and Stevens, and to rate 
each in terms o-f how -fully it reflected the research which should be 

6 

9 



included on a papep-and-penci 1 test -for the intial certification o-f 
teachers. This task required the judgement of experts; indlvidi:als invited 
to participate in this portion of the study qualified as experts on the 
basis of their academic credentials, experience with beginning teachers, and 
knowledge of research on teaching. Each of the panelists had an earned 
doctorate and five or more years experience supervising beginning teachers, 
had read two or more reviews of teacher effectiveness research, arid was 
familiar with the work of five or more research-^rs often cited in the 
reviews. In addition, to insure that the panel members represented a 
variety of disciplines and were likely to have expertise in different 
aspects of the teacher effectiveness research literature, the assembled 
panel included specialists in: elementary reading, elementary social 
studies, secondary English, secondary math, special education at the 
elementary and secondary levels, and supervision ano staff development at 
the elementary, secondary, and district levels. Three panel ists were nublic 
school administrators: a principal, an assistant superintendent for 
instruction, and an acting superintendent. The remainder were college 
professors teaching content and/or methods courses. AH had teaching 
experience at the elementary and/or secondary level. 

Procedure . Panelists were mailed copies of the revi.^ws Dy Brophy 
and Good, Doyle, a.id Rosenshine and Stevens, and asked to examine and rate 
each on a scale of 1 to 7, from minimally to highiy reflective of the 
research on teacher effectiveness which should be included on a 
paper-and-penci 1 test used in the initial certification of teachers. 
Panelists also were asked to supplement their ratings with comments. 

Ratings and comments were then returned to the Delphi coordinator, 
who tabulated them and returned to '^ach panelist a summary sheet listing, 
for each review of research, the mean, median, and the first and third 

7 

JO 



quaptiles -for the ratings, and aH comments made by panelists about the 
review. Panelists were then asked to rate the reviews again In light of 
this feedback, li a panelist's second rating -fell bela^y the 25th percentile 
or above the 75th percentile, he or she was asked to state the reason -for 
assigning a rating that was markedly di-f-ferent -frocn the judgment o-f the 
group. The ratings and comments produced in this second round were then 
-forwarded to the DMphi coord/nat*" -for tabulation. 

Because a consensus emerged -from the second-round ratings, a third 
round was unnecessary. 

^» sul ts. The -first-and second-round ratings o-f each panelist are 
presented in Table 2. As is most o-ften the case in Delphi investigations, 
panel ists were generally responsive to the -feedback and tended to change 
their ratings in the direction o-f the perceived consensus. 

The highest final ratings were assigned to the review by Brophy and 
Good, which was rated above the midpoint by 8 o-f the 9 panelists. These 
ratings contrast sharply with those -for the review by Doyle, which was rated 
below the midpoint by 8 o-f the 9 panelists, and contrast moderately with the 
ratings -for the Rosenshine and Stevens review, which received 5 ratings 
below the midpoint, 3 ratings at the midpoint, and one rating above the 
midpoint. The 19 comments which accompanied the -first-round ratings wer- 
distributed similarly: The Brophy snd Good review received -five positive 
comments and one negative comment; the Doyle review received no positive 
cofWBents, three comments which contained both oositive and negative 
elements, and three negative comments; and the Rosenshine and Stevens review 
received one positive, -five mixed, and one negative comment. In the second 
round, there were only -five comments, made by three panelists; in general, 
the second-round comments seem to rein-force those made in the -first round. 

It had been recognized that a possible outcome o-f Phase I might be 

8 

11 



TibU 2 

RitiPQt irm Phru I Dttohi Pmtl 



mm m ■iii«iny rtflKtlw highly rHlKt'H 

i 2 2 i i i 7 

MM » 9. • 

Mcotdrow^ 8 ACLHY2 NX 

MM > 5.1 



OOVLE 

J 2 3 4 

llfstMSrtd NX 2 CL IHY 

MM > 3.2 

XY A I CNN 2 L 

MM "2.9 



^ 2 31 .-a 2 

♦•Mtf"** L WNX <^C2 lY 

MM > 3.8 

ucoHroNd LHNX2 ASC Y 

MM > Z,6 



Nottt fmUtW rati«9t iri rt^tsHM by m idntifyitf littw, r«thw tkM a tiUy, te ailM 
rta*rs tt cofirt flr«t Md stCMtf roMtf ratii^i of iUMM ^utlitts. Littm wtrt miMid te 
pMtlists u follOMt 

A tiMNtary rtadiif 

I tiMNtiry NCial ttitfits 

C tiMNtary sptcial tdicatiM 

L sKCMiary C^iliih 

N stcoadiry Mtb 

N Mcoadary iiMcial tdvcatiM 

X wftfvisi'M, tiMHtary 
Y M9mi>iM, ttCM4ary 
Z MftfvUiM, district Imi 

9 



12 



the selection o-f ruore than one review -for use in Phase II, and Phase II 
procedures had been designed to accomnodate that possibility. Because the 
Brophy and Good review was the only one with a preponderance o-f positive 
ccifWAents and the only one rated at or above the midpoint by all of the 
panelists, and because these ratings contrasted considerably with the 
ratings -for the other two reviews, Brophy and Good's "Teacher Beha^'ior and 
Student Achievement" was the only review selected -for use in Phase II. 

Phase II 

Panel ists. Individuals participating in Phase II o-f this study 
considered items -from the Test o-f Pro-fessional Knowledge and classi-fied thera 
according to the -findings included in "Teacher Behavior and Student 
Achievement" (Brophy and Good, 1986). Each oi the 12 Phase II panelists had 
at least three years' experience as a classroom teacher, had served as a 
cooperating teacher to undergraduate students in -full-time student teaching 
placements, and had received in-service training in teacher e-ffectiveness. 
Nine o-f the panelists had participated in -^-week, 15-hour Master Teacher 
Development Course, which was designed to screngthen the skills o-f 
cooperating teachers and included recommendations drawn -from resea^^ch on 
teaching; the remaining three panelists had a minimum o-f nine hours o-f 
inservice education in research-based teacher ef-fecti veness strategies. 

Procedure. Two panels were assembled, each with six members; 
panelists teaching in the same school were assigned to di-f-ferent panels. 
Each panel considered nne-hal-f, or 52, o-f ti^e 104 ite.ms on the Test o-f 
Professional Knowledge included in a released form of ths NTE. Each Phase 
II panelist received a ^^cket containings 
-a detailed set of directions 

10 

13 



-a copy o-f "Teacher Behavior and Student Achievement," the review 
oi teacher ef -feet iveness researcf. by Brophy and Good <1986) selected in 
Phase 1 of this study 

-an additional copy oi the section of the review titled "Suraraary and 
Integration of the Findings," reproduced word for word and in the same order 
as the o.Mginal, but organized to make the panelists' task easier by 
grouping on a single page all of the research findings included in a single 
category. Also, each paragraph in the summary was numbered. 

-the 52 test i tems 

^a set of eight demonstration items, with possible classifications 
and comments, and 

-a stamped, return envelope. 

Panelists were asked to examine the review thoroughly, then to rsad 
each test item with its keyed answer and to classify it in terms of the 
research findings included in the review. The available classifications 
were: strongly supported by research, moderately supported by research, 
unrelated to the research cited in the review, moderately contradictory to 
research, or strongly contradictory to research. If the i tem was classified 
as either supported by or contradictory to research, the panelist was asked 
to identify the relevant research by writing the number appearing next to 
the related paragraph in the research summary. Panelists were also invited 
to comntent on their classifications. 

Panelists sent their item classifications and comments to the Delphi 
coordinator, who tabulated the responses and returned them to the panelists. 
In the second round of Phase 11, each panelist received: 

-a brief summary of the first-round data and directions for 
classifying the items a second time, in light of the feedback from other 
panel ists 

U 

14 



-a second copy O'f the test items, with the classifications, related 
paragraphs, and comnents panelists supplied "for each item during the 'first 
round, and with space 'for second-round classi'fi cations and comments, and 

-a reaction sheet, which disked panelists to identi'fy from among the 
test items classi'fied as "supported by research", tbo items 'for which 
knowledge o-f the research 'findings was important in selecting the keyed 
option. 

A-fter completing the second round classi'f ications and the reaction 
sheets, panelists returned them by mail to the Delphi coordinator. 

Resul ts. For the purposes o'f this study, an item was considered 
"supported by research" i'f at least 'four o-f the six panel members classi'fied 
the item as either strongly or moderately supported by a particular research 
'finding. Using this criterion, 20 O'f the 104 items <19%) can be considered 
supported by research. Nine O'f these 20 items (9% O'f the total items) were 
judged by a majority O'f panelists to be items 'for which knowledge of the 
related research finding was important in selecting the keyed option. The 
research findings cited by the panelists and the test items with which they 
are associated are presented in Table 3. The research finding cited most 
often <about success rates and academic learning time) and the five items 
associated with it are presented in Table 4. 

Seventy-one items (68%) can be considered "unrelated to the research 
cited:" 56 items were classified by all six panel members as unrelated and 
15 were classified unrelated by four or five panelists. 

None of the i tems were classified "moderately contradictory" or 
"strongly contradictory." Although some panelists used the moderately 
contradictory classification, none of the items were judged contradictory to 
research by four or more panelists 

Panelists also identified, from among the items they classified as 

12 



Ts^it 3 

B^Miren PtndinM A««oc{ittd ^iih TPK !t>M Cf»«ai^i>d 'toMrttd fey to-arch' 



Rtwarch Findings 



f«M»ck (•> it wlnctiaf 



Quantity and Pacsng of Instruction 
Oppportwiity to Ltarn/Conttnt Covtrtd 
Rcit Otfinition/e«-^tctatiOfis/TiM Allocation 
ClasKrean HanaQt»> t/Studtnt Engagtd Ti«t 
ConsisttRt Succtss/AcadtMic Ltarning I'iat 
Activ* Teaching 



9 



2 



Uholt Class vs. SmII Group vs. Indiviektaliz*d 
Instruction 



3 



3 



Siving Inlotinatton 
Structuring 



3 



2 



Rtdundancx/Stqutnc i ng 
Clarity 
EnthutiasM 
Pacing/Wait-TiM 



Qutstioning tht Studtnts 






Difficulty Uvtl of Gutstions 


1 




Cognitivt Ltvtl of Gutstiofts 


I 




Clarity of Qutstion 






Post-flutstion Uait'TiM 






Stitcting tht Rtspondtnt 






Waiting for tht Studtnt to Rtspond 






Rtacting to Studtnt Rtsponsts 






Rtaetions to Corrtct Rtspor«sts 


3 


1 


Rtacting to Partly Corrtct Rtsponsts 






Rtacting to Incorrtet Rtsponsts 






Rtacting to *Ne Rtspons«* 






Rtacting to Studtnt Qutstions and CoMMnts 


1 




Handling St&tnorK and HoMtMiriC AssignMnts 


1 




Conttxt'Spteif ic Findings 
8radt Levti 










Studtnt SES/Abilitx/Afftct 


1 




Ttachtr's Inttntions/Objtctivts 






Othtr 







<»> Total nu^r of IttiM classifitd 'supporttd by rtstarch' and linktd with 
rtstarch finding. <b> Of thost classifitd 'tupporttd by rts*arch% total nuMbtr for 
which Knowltdgt of tht rtstarch finding was iudgtd important In stitcting tht ktytd 
option. 



16 



Table 4 

Consistent Succ^fis/Academic Learn i no Tim# 



Research FindinQ<a) 

Consistent Success/Academic Learnir|Q Timyy To le^n sciently, students must be 
engaged in activities that api^ appropriate ir di'f'ficuUx level and otherwise suited 
CO their current achieveMnt !fcve!s and needs. It is istportant not only to maxiiiize 
content coverage by pacing the students briskly through the curriculum, but also to 
see that they sake continucuu ::rr)g;^&*s all along the way, moving through small steps 
with high (or at least moderate) rates of success and Minimal con-fusion or 
frustration. If lessons ar* to ruh smoothly without loss of momentoa and students 
are to work on assignments with high levels of success, teachers must be effective in 
diagnosing learning needs and prescribing appropriate activities. Their questions 
must usuoilly (about 755i cf the time) yield correct answers and seldom yield no 
response at all, and their seatwork activities must be completed with 90-100% success 
by most students. (Such high success rates should not be taken as suggestive of 
instructional overkill or assignment of pointless busywork. Appropriate seatwork 
will extend knowledge and provide needed practice. It will also be do-able, however, 
because it is pitched at the right level and because students have been prepared for 
it. Thus the high success rAtes result from effort and thought, not mere automatic 
applicatjon of alrMdy overiearned algorithms). Continuous progress at high 

rates of success, carried to the point that perfonSanct objectives can be met 
smaMhiy and rapidly, is especially important in the early grades and whenever 
students are learning basic knowledge or skiMs that will bt applied later in 
higher-level activities. (Brophy and Good, 19S6, p. 360-361) 



Associated TPK Items 

* 50A. Of th^ following, the most important element in the effective use of 
individualized instruction is: 

(A) effective communication between a student's parents and teachers 

(B) the establishment of appropriate evaluation standards 

♦ <C) accurate diagnosis and prescription of learning 

(D) the availability oi attractive instruciinnal materials 
^E) (he identification of posstbU information resources 

Classifications (b): Strongly supported - 2; Moderately supported - 4 



Note: Items marked with asterisks are those for which knowledge of the related 
research u«as judged important in selecting the keyed option. 

(a) The research about consistent success/academic learning ttme is discussed in 
three paragraphs in the 'Summary and Integration of the Findings" section of the 
Brophy and Good review. However, in classifying TPK items, panelists cited only the 
first paragraph of the discussion, which is reproduced here. (b) Indicates number 
of panelists assigning each classification. 



'^17 



Tablt continued 

Con^isttnt Succfss/AcAd eaic Learntno Tiif 



(iB. Which of the following should receive consideration by a teacher who is preparing a 
reading list from which students select required reading materials? 

I. Student interests m, Availabilty of the selections 

II. Reading level of the selections IV, tomunity resources, 

(A) I and II only <C) I, II, and ill only 

<B) I ard IV only <D) II, III, and IV only 

♦ (E) I, II, III, and IV 

Classifications: Str onoly supported - 1; Moderately supported - Ax Unrelated - 1 



♦ 5A. Research indicates that in classrooms where effective t<?aching and learning 
occur, the teacher is likely to be doing which of the following consistently? 

(A) Searing instruction to the typical student at a given grade level 

(B) Carefully grouping students at the beginning of he school year and making sure 
that these groups remain the same throughout tlie year 

<C) Identifying the affective behaviors that students are likely to exhibit at a 
given level of development 

<D) Working diligently with students to make sure that each learns all the material 
planned for the class for the year 

♦ <E) Pacing instruction so that students can ahead when they are able tc or 
receive extra help when they need it 

Classifications: Strongly supported - 3; Moderately supported - 1; 

- Related to a different resc^arch f indinG - 2 



13A. Good instructional planning is built around the idea that what learners will learn 
is most often determined by 

(A> what they should know f (C) how and why the/ learn 

(B) what their teacher knows <D) who does the teaching 

<E) what parents and adninistrators desire 

Classifications: Moder ately supported - 5t Related to a different research finding - 1 



38A. A policy of equal educational opportunity obligates the teacher in which of the 
fol lowing ways? 

(A) Every child must be taught the same things. 

(B) All children must be treated alike. 

(C) Instruction must exclude use of multi-cultural learning materials. 
(0) Every class must have a proportionate minority population. 

♦ (E) Instructional strategies must be adapted to the individual . 

Classifications: Moderately supported - 4; Unrelated - i; 

Related to a different research f indino - 1 



ERIC ^= 18 



supported by research, the items -for which kna^ledge o-f the related research 
•findings was important In seleciing the keyed option. There were no items 
identi-fied by all six panelists as meeting this criterion* One item was 
identified by -five panelists, and eight items were identi-fied by -four 
panelists as items -for which knowledge o-f the related research was important 
in selecting the correct answer. 

The results o-f Phase II would have di-f-fered slightly i-f a di-f-ferent 
criterion had been used, speci-f ically, '-f an I tem were considered "supported 
by research" when three, rather than Vour, panelists classi-fied the i t^ro as 
either strongly or moderately supported by a particular research -finding. 
Using the "three oi six" criterion, 28 rather than 20 items <2TA rather than 
ir/.y would be considered supported by research. O-f the eight additional 
items, -five were paired with the research -finding about consistent 
success/academic learning time — the -finding cited most o-ften under the 
previous criterion, and two items were paired with other research -findings 
already ci ted. 



Limitations o-f the Study 

Conclusions that may be drawn -from this study are subject to certain 
limitations imposed by the Delphi method, and by :^e use o-f a single -form o-f 
the NTE and a single review o-f research in Phase II. 

All Delphi studies are subject to an a priori limitation: the 
judgment achieved through the Delphi method represents a consensus among 
experts, but there is no guarantee that it represents the "best" judgnient. 
In addition, in this study. Phase II panelists classi-fying items as 
supported by research were limited to the research -findings included in the 
Brophy and Good review. I-f the Phase I panel had chosen a di-f-ferent review 
or an additional review? there would have been di-f-ferences in the item 

16 

ERiC 1 q 



classifications. However, at least one researcher's inHPormal analysis of 
the content of the TPK has yielded results that correspond to those 
generated in this study: Darling-Hammond (1986) concluded that "less than 
10% of over 100 questions required knowledge of theory, research, or facts 
pertaining to teaching and lear^'ng* (p. 20). 

Discussion 

In commenting on the extent to which Knowledge of the research 
findings is important in selecting the keyed answer for NTE items, one 
panelist said, "...the questions seem to be of the ^common sense' 
variety..."; another panelist stated, "I can see how someone with good 
general knowledge (higher SAT scores) and good test-taking ability would be 
able to do well without exposure to educational research." In these 
coiwnents, the panelists echo critics who have suggested that the NTE Test of 
Professional Knowledge measures something other than teachers' professional 
knowledge. Evidenci* provided by Andrews, Blackraon, and Mackey (1980), 
Miller, Poggio, and Ijlasnap? (1987), Loadman (1987), Lovelace and Martin 
(1984), Pitcher (cited in Wilson, 1986), and Weber and McBee (1987) supports 
Nelsen's (1985) conclusion that 

performance variations may be largely attributable to factors such as 
general intelligence, scholastic aptitude, overall academic 
achievement, and multiple-choice test item reasoning skills, rather 
than to the extent of instruction or mastery of particular domains of 
the curriculum, such as professional education, (p. IQ66) 

To the extent to which the test measi'res such factors as gereral 
intelligence rather than teachers' professional knowledge, the TPK can be 
regardeo as lacking in educational importance. The educational importance 
of a test can be questioned when the test measures some^thing unimportant or 

17 

20 



fails to measure something important (Cronbachi 1971). Insofar as it 
measures scholastic aptitude and overall academic achievement, the TPK might 
be regarded a^ educationally un important » not because these factors are 
unimportant in initial teacher certification decisions, but becaU!;e other 
measures of them (e.g., SAT or GRE scores) already exist as part of the 
educational record o-^ nearly every applicant for teacher certification. In 
addition, judging from the results of this study, the TPK fails to measure 
important aspects of teachers' professional knowledge. Although the TPK 
contains 20 items judged by panelists to be related to 10 research findings, 
the panelists estimated that for 11 of those items (55Ji), knowledge of the 
related research was not important in selecting the correct answer. In 
covering part of the professional knowledge base for teachers with items 
requiring only good general knowledge and/or common sense, the TPK misses 
opportunities to measure some important aspects of teachers' professional 
knowledge. For example, consider the following item: 

3A. Each term, a teacher provides bAok lists from which th» students 
choose books about which they will write book reports. Some topics 
seem to appeal more to girls and others appeal more to boys. The 
teacher could best help the students find books that '/-ill most likely 
appeal to them by doing which of the following? 

(A) Listing all of the bo;. by reading level 
^ (B) Listing all of the books by subject 

(C) Grouping all of the books by length 

(D) Making up one list for boys and another list for girls 

(E) Making up one list of books L>/ nale authors and another by 
female authors (ETS, 198^, p. ?09) 

This item was classified by four of the six panelists as moderately 

supported by the research on structuring. One panelist consented that 

"Paragraph 14 - structuring - does deal with the skills of presenting 

information and structuring techniques. There is a s ight connection 

this listing could be an advance organizer." The discussion of structuring 

in the summary of the Brophy and Good review mentions, in addition to 

advance organizers, "overviews, review of objectives; outlining the content 

O 13 

ERJC 21 



and signaling transitions between lesson parts,* calling attention to main 
ideas; summarizing subparts o-f the lesson as it proceeds; reviewing main 
ideas at the end; • • .usi ng organizing concepts, analogies, 
[and]. ..rule-example-rule patterns* (p. 362). While there m^y be debate 
about whether the item relates to advance organizers, it seems clear that 
2.nswering it correctly does net depend on teachers' pro-fessional knowledge 
about structuring. 



Although rh^re are two other TPK items considered supported by 
research and linked to the •'?search on structuring, they too measure 
knowledge o-f structuring only rudimentar i ly. Those two items are 
particularly noteworthy because they were the only items in this study 
classi-fied by all six panel members as strongly supported by research. The 
items are presented below. 



12A. A -fourth grade class is going to vis*t a museum -for the -first 
time. In order to prepare the students to Uan- -from the experience, 
the teacher should do which o-f the -following? 

I. Give the pupils a set o-f que^^tions about the exhibits in an 
e-f-fort to -fc:us their attention during the visit. 
11. Tell pupils about museums — what they are and why people visit 
them. 

III. Have a lesson about some o-f the exhibits pupils will see on the 
trip. 

IV. Tell the pupMs the -field trip will be a test o-f their ability 
to practice good citizenship. 

(A) I only 

<B) II only 

<C) I end IV only 

* <D) I, II, and III only 

(E) II, III, and IV only (ETS, 1984, p. ill) 



40A. Which o-f the -following, i-f given to high school students at the 
beginning o-f a new courses, is aii example o-f an advance organizer? 

<A) A list o-f books required to do the supplementary reading 
* (B) An overview o-f the course that includes objectives and assessment 
cri ter ia 

(C) An essay assignment to determine? levels o-f writing skill in the 

19 

22 



class 

(D) A lecture about discipline and behavior standards in the classroom 

(E) A reading test to determine the students' ability to read material 
in the content ^ield (ETS, 1984, p. 116) 

Both items Mere judged by -four o-f the six panelists as items for 
which knowledge oi the research -finding was important in selecting the keyed 
option. The -first item, 12 A, measures an aspect o-f structuring that is not 
closely related to the speci-fic and complex elements o-f structuring 
described by Brophy and Good as part o-f lesson presentation. The second 
item, 40 A, was cited by Dari ing-Hammond (1986) to demonstrate the very 
elementary nature o-f even those -few TPK itoms that do require "knowledge o-f 
theory, research, or -facts pertaining to teaching and learning" (p. 20). 

The research -finding on structuring and the three TPK items 
associated with it deserve care-ful consideration because only one research 
-finding was associated with more items (Consistent Success/Academic Learning 
Time, associated with -five items) and because no other items wer'5 judged to 
be as strongly supported by i research -finding. However, it would appear 
that even these items cannot be cited as evidence o-f the educational 
importance o-f the TPK, because they measure less importrit aspects of the 
topic and -fail to measure more important elements. As Darling-Hammond 
(1986) has said, the TPK is " 1 imi ted. . .by the sw.arciiy o-f important teaching 
questions answerable in multiple-choice -formats; the questions with clear, 
correct answers are not very pro-found" (p. 21). The items associated with 
structuring may demonstrate what Bracey (1987) has described as the tendency 
-for minimum competency tests to emphasize trivial objectives at the expense 
of more difficult aspects of the curriculum which may be harder to assess. 

A lack of evidence supporting the educational importance of the TPK 
raises questions of construct validation. Although IMTE publications provide 
evidence of content validation and do not discuss construct validation, some 
critics (e.g., Madaus and Pull in, 1987} Nelsen, 1985) have argued that 
O 20 

ERIC 

"~ 23 



construct validation is essential ^or a test such as the NTE. Standards for 
Educational and Psych ological Testino <AERA, APA, ai^d NCME, 1985) includes a 
description o-f test validation as a process that requires evidence o-f 
content-, criterion-, and construct-related validity. It is interesting to 
note that shortly be-fore the release o-f the revised Cure Battery, an TfTE 
s^U-f-fer co-authored an article which included the statement that "convincing 
arguments place construct validity at the heart o-f questions involving test 
interpretation and use, thus making it an imperative adjunct to any -future 
research or operational e-ffort to improve the MTE" (Rosner and Howey, 1982, 
p. 7). 

It seems reasonable to assume that a measure o-f a construct 
involving teachers' pro-fessional knowledge woul d necessar i ly include items 
designed to measure the ability to apply knowledge derived -from research or. 
teacher e-f-fect i veness. Because a najority o-f the items classi-fied as 
related to research could be answered correctly without knowledge o-f the 
research, the construct underlying per-formance on the TPK would seem to 
involve this comoonent o-f teachers' pro-fessional knowledge only minimally 
and to be related instead to factors that are not speci-fic to teachers' 
pro-fessional knowledge, such as "general intelligence, scholastic aptitude, 
overall academic achievement, and multiple-choice test item reasoning 
skills" <Nelsen, 1985, p. 1066). 

The extent to which the TPK measures such -factors while neglecting 
some elements o-f teachers' pro-fessional knowledge can be viewed as troubling 
in light o-f the role o-f the TPK in shaping teacher education curricula. In 
22 states, applicants -for teacher certi-f ication must p?.cs the NTE} in some 
states, teacher education programs with a speci-fied percentage o-f graduates 
who do not pass the HTE are threatened with loss o-f their state approval 
(6oor:fison, 1986). Facul ty members in teacher education programs want their 

21 

24 



graduates to i?arn ctrti-f ication, so it is inevitable that the NTE will have 
an influence on teacher education curricula. The extent o-f this in-fluence 
is suggested in a New York Statu Education Department memoranduni reporting 
the results o-f a survey o-f e-f-forts made by colleges to aid members o-f 
minority groups in passing the MTE. The memorandum lists nine activities 
reported by colleges, including "Revision o-f the curriculum to reflect 
knowledge necessary to pass the NTE, especially in courses devoted to the 
teaching-learning process" and "0-f-fering a two^-credit course in preparation 
for the NTE" (Van Ryn, 1987, p. 4). 

Acknowledging the power of the NTE to shape teacher preparation 
curricula, Shulman (1957) has argued that initial teacher certification 
tests 

must become tests worth teaching fcr. The traditional criteria of 
reliability and validity are no longer sufficient. As long as 
assessments f>ive instruction, assessment designers have a moral 
obligation to create instruments that correspond to appropriate images 
of excellent professional preparation and practice. <p. 44) 



Conclusions 

It is unfortunate that the ffTE Test of Professional Knowledge, 
adopted by 22 states in the wake of the educational reform movement, may 
have the unintended effect of impeding other parts of that movement. 
Considerable effort has been devoted to building an understanding of the 
professional nature of t^act.^rs^ work, and to countering a public perception 
that reasonably competent adults do not need special preparation to become 
effective teachers. In offering a test of professional knowledge on which 
only f>; of the items were judged to require knowledge of research in teacher 

ERIC 2S 



efffct iMtntss, the NTF may bt seen as rernforcing tht notion that ttachtrs^ 
professional knowlfdgt is littU mort than good general Knowledge wnd cowion 
sense. This sinplistic view of teaching may lead to a superficial 
definition cf the professional knowledge base and threaten ef^. ^ts to 
enhance the professional status of teaching* 

Recently, ETS acknowledged the limHationdi of the current NTE and 
announced plans to replace it with a 'radically different" test that will be 
available to states by 1992 (Olson, 1988, p. 1). The n^w test is expected 
to differ from the current NTE in its use of advances in technology to allow 
for ad;iptive testing, and in the timetable for test administration. Unlike 
the current NTE, which can be completed in a single day at any point before 
certification, the new (and still unnamed) test will be administered at 
three separate stages in a teacher's career: after the sophomore year, a 
computerized diagnostic battery will assess basic skills; at the end of the 
teacher-education sequence, a paper-^and-penci 1 test will measure knowledge 
of content and pedagogy} and after a substantial prac ice teaching 
experience or internship, a performance test will assess the ability to 
teac^ a given content area in a classroom settmg. The performance test may 
be supplemented with computer simulation exercises and with portfolios that 
document a teacher's work (Dwyer, 1988). 

Gf»Jen that the presidc-nt of ETS, Gregory Anrig, described v test 
as "radically different" and called the test development process a -full 
court press" representing a "high risk" for ETS (Olson, 1988, p. 27), the 
new test could be a significant departure from earlier revisions. However, 
it seems likely that the portion of the new test designed to assess 
teachers' knowledge of pedagogy will continue to resemble the current Test 
of Professional Knowledge. Like the TPK, the new test will be administered 
when a prospective teachsr has completed the teacher education sequence but 

23 



be-forp he or she has substantial classroom experience. Until recently, ETS 
described it as a paper-and-pencil test using a multiple choice format like 
the current professional knowledge test <Dwyer, 1988); however, it now 
appears that ETS is exploring the inclusion oi son^ constructed response 
items, such as short-answer items, along with the multiple-choice items 
(Fiero, 1990). 

Some critics contend that no objective test is likely to yield an 
adequate me^^cure o-f teachers' professional knowledge: "In general, the 
state of the art does not permit objective tests for directly measuring 
higher order thinking skills, problem solvi.^q strategies, and metacugn i ti ve 
abilities involved in tasks such as teachino'- (Freder icksen and Collins, 
1989, p. 29). Advances in technology, such as the interactive videodisc, 
may soon be applied to assessment methods and allow for improved measurement 
of complex knowledge and skills. But for several reasons, including cost 
considerations and lack of equal access to computers in many areas where the 
test will be used, teachers' professional knowledge may continue to be 
measured primarily using traditional assesement methods. And^ to extend a 
caution advanced by Renfrow and Cromrey (1990), such changes in format could 
be cosmetic, enhancing face validity only. 

If the new test is to be modeled to some extent on the current TPK, 
the data generated in this study may provide the test writers with some new 
perspectives on the content of a test of teachers' professional knowledge. 
Darling-Hammond (1986) has observed that many of the TPK items require 
examiness to "choose a teaching technique in response to short scenarios 
that give insufficient information to make a truly reasoned judgement. ...a 
thoughtful, honest, and knowledgeable teacher would in most cases have to 
answer, Mt depends.'" (p. 46). In describing challenges facing the new NTE, 
Dwyer (1989) has identified "the need to con textual ire the assessment and to 



24 

27 



bring it closer to speci-fic teaching situations* <p.36). However, as long 
as the new test o-f pedagogy remains a paper-and-penci 1 test using primarily 
multiple-choice items, test developers may wish to consider including more 
items that test knowledge o-f research -findings at the knowledge level rather 
than the application level. 

More than haH o-f the items on the current TPK are at the 
application level (ETS, 1984) and inct^ude brie-f descriptions o-f classroom 
situations. ETS's decision to test teachers at the application level is 
understandable, particularly in light o-f test users' demands that licensure 
exams demonstrate job relevance. However, there may be insurmountable 
di-f-f iculties inherent in using mul tiple-choice items to measure teachers'' 
ability to apply pro-fessi onal knowledge, particularly knowledge derived -from 
research on teaching. Research seldom, i-f ever, yields direct rules -for 
practice. E-f-fective teaching is highly context-sensitive, and 
recommendations drawn -from research that are e-f-fective in one context ma> be 
ine-f-fecti ve or counterproductive in another setting, (Darling-Hammond's 
comment that the correct answer to most 1?K items is "It depends" re-flects 
this reality.) Although -findings -from research on teaching cannot provide 
prospective teachers with a recipe to -follow in any given classroom 
situation, they can help teachers analyze classroom events and -formulate 
plans -for action that are based c*^ more than intuition. A primary value o-f 
the -findings -from teacher e-f-fecti veness research is that, as Gage <i98S) has 
said, "(they) give teachers something to reason with and about," 

Describing teaching contexts in su-f-ficient detail to allow accurate 
measurement o-f the ability to apply knowledge o-f research on teaching to 
classroom situations seems to be a task that is unlikely to be accomplished 
using traditional assessment methods. (However, it may be possible to do 
this with inters 'ive videodiscs; the technology currently available seems 

25 

2S 



well suited to representing classrooms in all their vitality and complexity, 
directing examinees'* attention to one aspect o-f the classroom situation and 
posing a question abo^tt it, then presenting subsequent items based on 
examinees' responses to earlier ones.) Since multiple-choJce items a«^e 
likely to continue to be a mainstay of the successor to the TPK, it may be 
most appropriate to use them to determine if examinees can answer factual 
questions about key concepts such as academic learning time, structuring, or 
wait time. In using multiple-choice items to test knowledge of research 
findings at the knowledge level only, item writers can avoid application 
items that suggest there is a single correct response to a given classroom 
situation, as well as items that are so general that they can be answered 
correctly by examinees who do not have knowledge of the underlying concept. 

Clearly, testing knowledge of research on teaching at the knowledge 
level only is not a satisfactory long term solution. In preparing good 
application items, developers of the new NTE may want to consider variations 
on traditional items and new strategies for item validation. Norris (1989), 
in discussing the development of tests of critical thinking, has suggested 
strategies that may be adapted successfully to testing teachers' 
professional knowledge. 

In a modification of an objective test, Norris has asked examinees 
to justify, orally or in writing, their answers to mul tiple-^choice items. 
When generating application items, NTE developers might consider similar 
two-part items. A standard multiple-choice question rr^ght be followed by a 
second question, also using a multiple-choice format, t-iat asks for 
justification for the first response, as a means of determining thi? data or 
reasoning an examinee used in selecting one option over another. 
Presumably, an examinee who selectej the right option for the wrong reason 

26 



would be penal ized. 

Norris (1989) also has identi-fied strategies that may be use'ful in 
validating application items, specif icallxi asking examinees to think aloud 
while working on items and l king them to describe how they used particular 
pieces of information presented in the item in selecting an answer. Data 
gathered in this way can vhe uoed in modifying application items when, for 
example^ it appears that examinees have selected an option other than the 
key, despite having recognized and used the relevant research, or when 
examinees do not give evidence of using research findings but still are able 
to arrive at thf^ keyed response to an item designated as requiring 
application of knowledge derived from research on teaching. 

Recommendations for Further Research 
Additional research related to this topic might include: variations 
on the present study; a survey of the ways research on teaching currently is 
presented in teacher preparation curricula, which could lead to new TPK 
items reflective of exemplary practice in teacher education; and the 
exploration of ways to assess the professional knowledge of beginning 
teachers using formats other than multiple-choice test items. 

Variations on this study might generate results that are different 
from those reported here and suggest other interpretations of how findings 
from research on teacher effectiveness are represented on th9 Test of 
Professional Knowledge. For example, the use of a different or additional 
review of research in Phase II could lead to results suggesting more or less 
congruence between findings from research on teacher effectiveness and TPK 
items, or to different understandings the extent to which certain 
findings are represented. Another valuable variation would be the 

presentation of TPK items without the key. In the current study, all items 

27 

30 



were presented with the keyed option marked by an asterisk 5 it might be 
use-ful to see i-f not directing panelists to the intent o-f an item in that 
way would lead to ai -f-ferences in the extent o-f agreement among panelists. 

Stall Ings (1984) has noted that -findings -from research on teaching 
were disseminated -first to practicing teachers through inserwice education, 
and only later to preservice teachers in teacher education programs. This 
lag in dissemination may help account -for the relatively weak representation 
o-f findings -from research on teaching on the NTE T«st of Pro-fesslonal 
Knowledge, because validation studies (conducted primarily in the early 
1980s) involved comparison o-f the test items with the content o-f teacher 
preparation programs. Given the increased attention to the importance o-f 
including -findings -from research on teaching in teacher education curricula 
(-for example, the recommendations -for re-form o-f teacher education issued by 
the National Commission -for Excellence in Teacher Education [19853), it is 
reasonable to expect that the content o-f teacher education programs has been 
modified considerably since the earliest validation studies were conducted. 
A survey ui the ways teacher educators present -findings -from research on 
teaching and assess mastery o-f that portion o-f the body o-f pro-fessional 
knowledge could lead to the identi-f ication o-f exemplars o-f outstanding 
practice -for the bene-fit o-f both teacher educators and TPK item writers. 

It is likely that a meaning-ful test o-f a beginning teacher's 
pro-fessional knowledge will measure the ability to apply learning theory, 
knowledge o-f child or adolescent development, and recommendations drawn -from 
research on teatning in ways that are context-sensitive, that is, in ways 
that respond to differences in student ability level and educational setting 
and are appropriate to particular subject areas and grade levels. This will 
be extremely difficult to accomplish within the limits of a paper-and-pencil 
test using primarily a multiple-choice format. Lee Shulman (1987) and his 

28 



31 



Si:. 



collfagucs on th« Te«ch*p Assessment Project (TAP) are currently developing 
prototypes for assessing tlie competence oi experienced teachers who will 
seek voluntary professional certification through the National Board of 
Professional Teaching Standards. ETS should be comliiehde.d for seelcing to 
apply sone of the ideas from t^e TAP to assessment, for entrVl«y«1 t«*c^ 
artd encouraged to continue to explore alternate formats for the measurement 
of teachers' professional Knowledge in ways that are meaningful and 
representative o-f classroom practice. 



•41 

--#1 

- >m 



I 



m. 

f0\ 



29 



Re-ferencetr 



American Educational Research Association, American Psychological Association, 
& National Council on Measurement in Education. (1985). Standards "for 
Educatio nal and Psycholooi cal Testing. Uashingtony D.C.: American 
Psychological Association. 

Andrews, J. W., Blaci<mon, C. R., 4 Mackey, J. A. (1980). Pi^eservice per-formance 
and the National Teacher Exams. Phi Delta Kapoan. ^1. 358-359. 

Berliner, D. C. (1984). The haH-full glass: A review o-f research on teaching. 
In P. L. Hos^ord (Ed.), Usino what we know about teaching (pp. 51-77). 
Alexandria, VA: Association -for Supervision and Curriculum Development. 

Sracey, 6. U. (1987). Measurement-driven instruction: Catchy phrase, dangerous 
practice. Phi Delta Kappan. 68. 683-686. 

Brophy, J., & Good, T.L. (1986). Teacher behavior and student achievement. In 
M. C. Wittrock (Ed.), Handbook O'f research on teaching (3rd ed., pp. 
328-375). New York: Macmillan. 

Clark, C. M. (1984). Research on teaching and the content oi teacher education 
programs: An optimistic vi^w. (Occasional Paper No. 75). East Lansing, MI: 
Michigan State University, Institute -for Research on Teaching. 

Cronbach, L. J. (1971). Test validation. In R. L. Thorndike, (Ed.), 

Educational Measurement (2nd ed., pp. 443-507). Washington, DC: American 
Counc i 1 on Educat ion . 

Darling-Hammond, L. (1986). Teaching knowledge: How do we test it? American 
Educator. 46^ 18-21. 

Doyle, U. (1982). Research on classroom contexts: Toward a knowledge base -for 
policy and practice in teacher education. in 6. A. Gri^^in & H. Hukill 
< Eds . ) , Alternate Perspectives "for Program Development and Research in 
Teacher E ducation (pp. 75-94) .Austin, TX: University o^ Texas at Austin, 
Research and Development Center -for Teacher Education. (ERIC Document 
Reproduction Service No. ED 223-578) 

Doyle, U. (1986). Classroom organization and management. In M. C. Wittrock 
<Ed.), Handbook o^ research on teaching (3rd ed., pp. 392-431). New York: 
Macmi 1 Ian. 

Owyer, C.A. (1988, November). A new generation of tests for teacher licensure. 
Paper presented at the annual meeting o-f the Northeastern Research 
Association, Ellenville, NY. 

Dwyer, C.A. (1989). A new generation o-f tests -for licensing beginning teachers. 
New Directions -f or Teacher Assessment: Proceedings o-f the 1988 ETS 
Invitational Cgnference (pp. 29-38). Princeton, NJ: Educational Testing 
Service. 

Educational Testing Service. (1984). A guide to the NTE Core Battery Tests, 
Princeton, NJ: Author. 

30 



33 



Egbtrt, R.L. (1984). The role o-f research in teacher education. In R.L. Egbert 
anr H H. Kluender (Eds.), Using research to improve teachtr education: The 
Nfbrar ^!<a consortium. (Teacher Education ^iorlogr^ph No. 1, pp. 9-21). 
WaUf-ogton, DC: ERIC Clearinghouse on Teacher Education. 

Fierc , D. (1990). Working description o-f Stage 11. Uorkino papers touiard a n^w 
Qg rg'ratlon o'f teacher assessments. Princeton, NJ: Educational "testing 
SQt^vi ce. 

Frer: -iksen, J. R. & Collins, A. (1989). A systems approach to educational 
Inftting. Educational Researcher. 18. 27-32. 

Gage, N. L. (1978). The scienti'fic basis o'f the art o-f teaching. New York: 
Teachers College Press. 

Gage, N. L. (1985). Hard pains in the scft sciences: The case o'f pedaoooy. 
Bloomington, IN: Phi Delta Kappa, Center on Evaluation, Development, and 
Research . 

Good, T. L, (1983), Research on classroom teaching. In L. S. Schulman & G. A. 
Sykes (Eds.), Handbook o'f Teaching and Policy (pp. 42-80). New York: 
Longman . 

Goodison, M, (1986). Teacher testing: Improving the outcomes. Paper presented 
at the annual meeting o-f the American Educational Research Association, San 
Franc i sco, 

Hersh, R. M, (1982). What makes some schools and teachers more e'f'fective. In D. 
C. Corrigaii (Ed.), The 'future O'f teacher education: Needed research and 
P^^ctice (pp. 1^9). College Station, TX: Texas A & M University, College O'f 
Education. (ERIC Document Reproduction Service No. ED 225 997) 

HJndman, S. E. 4 Polsgrove, L. (1988). Di'f'ferentiai e'ffects O'f 'feedback on 
preservice teacher behavior. Teacher Education and Special Educat ion^ 11. 
25-29. 

Hos'ford, P. L. (1984), The art o'f applying the science o'f educ^tion. In P. L. 
Hos'ford (Ed.), Using what we know about teaching (pp. 141-161). Alexandria, 
KfAx Association -for Supervision and Curriculum Development. 

Hunter, M. (1984). Knowing, teaching, and supervising. In P. L. Hos'ford (Ed.), 
Using wha t we know about teaching (pp. 169-192). Alexandria, VA: 
Associatign 'for Supervisign and Curriculum Development. 

Kluender, M.M. (1984). The Nebraska cgnsgrtium: Using research tg imprgve 
teacher educatign. In R.L. Egbert and M.M. Kluendrr (Eds.), Using research 
to imprgve teacher education: The Nebraska consgrtium. (Teacher Educatign 
Monggraph Ng. 1, pp. 1-8). Uashingtgn, DC: ERIC r aringhguse gn Te*.cher 
Educat i gn. 

Linstgne, H. A., & Turg'f'f, M. (Eds.). (1975). Thfe Delphi methgd; Techniques and 
appl icatigns. Reading, MA: Addisgn-Uesley, 

Lgadman, U.E. (1987, April). Use g'f the NTE as a measure g-f cgnpetence -fgr 

teacher certt^f icatign. Paper presented at the annual meeting O'f the American 

32 



34 



Educational Research As;^ociatio^, Washington, DC, 



Lovelace, T. & Martin, C. E. (1984). The revised National Teacher Examinations 
?f ? Prftii(Lior Of teachers^ oerforreance in public school classroais. 
Lafayette, LA: University oi Southwestern Louisiana. (ERIC Document 
Reproduction Service No. ED 251 4U) 

Madaus, G. F. it Pull in, D. (1987). Teacher certification tests: Do they really 
measure what we need to know? Phi Delta Kapoan. 69, 31-38. 

Miller, M. D., Poggio, J. P., & Glasnapp, D. R. (1987). Teachers' professional 
knowledge: Are we measuring acquired skills oi^ common knowledge? Journal Oi 
Personnel Evaluation in Education. 1. 57-68. 

National Commission ^or Excellence in Teacher Education. (1985). A call ^or 
chanoe i n teacher education. Ua^hington, DC: American Association of 
Colleges -for Teacher Education. 

Nelsen, E. A. (1985). Review o^ NTE programs. In J. V. Mitchell (ed.). The 
ninth mental measur ements yearbook (pp. 1063-11166). Lincoln, NE: Buros 
Institute of Mental Measurements of the Unive.s'ty of Nebraska - Lincoln. 

Norris, S. P. (1989). Can we test validly for critical thinking? Educational 
Researcher. <8. 21-26. 

Olson, L. (1988, November 2). 'Different' tests of teaching skill planned by 
iirz. cu'-'cation Week , pp. I, 27. 

Pipho, C. (1986). States move reform closer to reality. Phi Del ta Kaopan. 68. 
K1-K8. 

Rosenshine, B. & Stevens, R. (1986). Teaching functions. In M. C. Wittrock 
(Ed.), Handbook of research on teaching (3pd ed.» pp. 376-391). New York: 
Macmi 1 1an. 

Rosner, F. C. & Howey, K. R. (1982). Cjntruct validity in assessing teacher 
knowledge: New NTE interpretations. Journal of Teacher Education. 33. 6, 
7-12. 

Rudner, L. M. (1987). Questions and answers concerning ^eacher testing. In L. 
M. Rudner (Ed.), What's happening in teacher testing; An analysis of state 
teacher testing practice^ (pp. 3-8). Washington, O.C.: US Department of 
Education, Office of Educational Research and Improvement. 

Shulraan, L. S. (1987). Assessment for teaching; An initiitive for the 
profession. Phi Delta Kappan. 69, 38-44. 

Smith, B. 0. (1985). Research bases for teacher education. Phi Delta Kappan. 
66. 685-690 . 

Smith, 0. C. (1982, May). The content in teact>er education programs. Paper 
presented at the Conference on the Future of Teacher Education: Needed 
Research and Practice, College Station, TX. (ERIC Document Reproduction 
Service No. ED 217-029) 

Sn th, 0, C. (1983, February). PROTEACH; An extended preservice teacher 

33 



35 



preparati on ppooram. Paper presented at the annual meeting the American 
Association of Colleges iov Teacher Education, Detroit. (ERIC Document 
Reproduction Service No. ED 232 976) 

Stal lings, J. A. (1984). Implications from the research on teaching for teacher 
preparation. In R.L. Egbert and M.M. Kluender (Eds.), Ustno research to 
improve teacher educ ation* The Nebraska consortium. (Teacher Education 
Monograph No. 1, pp. 128-146). Uashington, DC: ERIC Clearinghouse on Teacher 
Education . 

Trentham, L., Lauderdale, W., Miller, E., & Rudder, C. (1986, April). 

Relationships am ong the Alabama Initial Teacher Certification Tyst scores, 
CQlleoe v ariables, and classroom performance ratings. Paper presented at the 
annual meeting of the National Council on Measurement in Education, San 
Franc t SCO. 

U.S. Department of Education. (198o). Uhat works Research about teaching ^nd 
learning. Uashington. D.C.: Author. 

U.S. Department of Education: (1987^. Ulhat works: Research about teaching and 
learning. (2nd ed.). Uashington, D.C.: Author. 

Van Ryn, M. (1987, December 8). [Results of a memorandum concerning attraction 
and recruitment of minorities to teaching.] Memorandum from the Assistant 
Commisioner for Higher Education Services, State Education Department, 
University of the State of New York. 

Ualberg, H. o\ (1986). Syntheses of research on teaching. In M. C. Uittrock 
<Ed.), Handbook of research on teaching (3rd ed., pp. 214-229). New York: 
Macmi llan . 



Weber, L. J., & McBee, J. K. (1987, Ap'^il). A challenge to the validity of the 
Professional Knowledge Test of the Core Battery of the National Teachers 
Examination Csicl: A teacher licensing problem. Paper presented at the 
annual meeting of the American Educational Research Association, Uashington, 
DC. 



Uilson, A. J. (1986, April). Historical issu e s of validity and validation: The 
National Teacher Examinations. Paper presented at the annual meeting of the 
American Educational Research Association, San Francisco. (ERIC Document 
Reproduction Service No. ED 270 503) 

Uyne, M. D. & Stuck, 6. B. (1932). Time and learning: Implications for the 
classroom teacher. Elementary School Journal. 83 . 68-75. 



31 

36 



